An Operand Status Based Instruction Steering Scheme for Clustered Architectures
نویسندگان
چکیده
Clustered architectures which intend to process data within a localized PE are one of the approaches to increase the performance under the difficulties of the wire delay problems. The performance of the clustered architecture depends on the implemented instruction steering scheme. Existing steering schemes insert inter-PE communications to achieve load balance among PEs. These insertions delay the executions of the dependent instructions and lead to the degradation of the performance. In this paper, we propose a novel instruction steering scheme, which gives priority to critical dependencies. The way to find out the critical dependencies is by observing the status of the source operands of an instruction. We evaluate the proposed scheme and compare it with the existing ones. The results show that the proposed scheme outperforms the existing schemes in terms of instruction per clock because of reductions of the critical inter-PE communications with superior load balance among the PEs.
منابع مشابه
Compiler-assisted power optimization for clustered VLIW architectures
Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...
متن کاملImproving Dictionary-Based Code Compression in VLIW Architectures
Reducing code size is crucial in embedded systems as well as in high-performance systems to overcome the communication bottleneck between memory and CPU, especially with VLIW (Very Long Instruction Word) processors that require a high-bandwidth instruction prefetching. This paper presents a new approach for dictionary-based code compression in VLIW processor-based systems using isomorphism amon...
متن کاملPragmatic integrated scheduling for clustered VLIW architectures
Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Scheduling for clustered architectures involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). Various clustered VLIW configurations, connectivity types, and inter-cluste...
متن کاملPALF: compiler supports for irregular register files in clustered VLIW DSP processors
Wide varieties of register file architectures — developed for embedded processors — have turned to aim at reducing the power dissipation and die size these years, by contrast with the traditional unified register file structures. This article presents a novel register allocation scheme for a clustered VLIW DSP, which is designed with distinctively banked register files in which port access is h...
متن کاملCode generation for a Coarse-Grained Reconfigurable Architecture
Good tool support is essential for computing platforms because they increase programmability. This is especially the case for reconfigurable architectures because applications need to be mapped on the architecture for each configuration individually. This paper introduces a compiler backend for Coarse Grained Reconfigurable Arrays (CGRA) based on LLVM. The CGRA compiler must be retargetable to ...
متن کامل